{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/RAG/advanced_rag/vanilla_rag_with_langchain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Retrieval-Augmented Generation (RAG) with Milvus and LangChain\n",
    "\n",
    "This guide demonstrates how to build a Retrieval-Augmented Generation (RAG) system using LangChain and Milvus.\n",
    "\n",
    "The RAG system combines a retrieval system with a generative model to generate new text based on a given prompt. The system first retrieves relevant documents from a corpus using Milvus, and then uses a generative model to generate new text based on the retrieved documents.\n",
    "\n",
    "[LangChain](https://www.langchain.com/) is a framework for developing applications powered by large language models (LLMs). [Milvus](https://milvus.io/) is the world's most advanced open-source vector database, built to power embedding similarity search and AI applications.\n",
    "\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Google Colab preparation[optional]\n",
    "This is an optional step, if you want to run this notebook on Google Colab."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "! git clone https://github.com/milvus-io/bootcamp.git"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "import shutil\n",
    "src_dir = \"./bootcamp/bootcamp/RAG/advanced_rag/rag_utils\"\n",
    "dst_dir = \"./rag_utils\"\n",
    "shutil.copytree(src_dir, dst_dir)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "! pip install --upgrade langchain langchain-community langchain_milvus langchain-openai bs4"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Please prepare you [OPENAI_API_KEY](https://openai.com/index/openai-api/) in your environment variables.\n",
    "![](imgs/colab_api_key1.png)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### If you are running this notebook on Google Colab, you have to restart this session by `Cmd/Ctrl + M`, then press `.` to make the environment take effect."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "from google.colab import userdata\n",
    "import os\n",
    "\n",
    "os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "----\n",
    "## Get started"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "\n",
    "## Prepare the data\n",
    "\n",
    "We use the Langchain WebBaseLoader to load documents from web sources and split them into chunks using the RecursiveCharacterTextSplitter.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content='Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n\\n\\nTool use\\n\\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import bs4\n",
    "from langchain_community.document_loaders import WebBaseLoader\n",
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "# Create a WebBaseLoader instance to load documents from web sources\n",
    "loader = WebBaseLoader(\n",
    "    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n",
    "    bs_kwargs=dict(\n",
    "        parse_only=bs4.SoupStrainer(\n",
    "            class_=(\"post-content\", \"post-title\", \"post-header\")\n",
    "        )\n",
    "    ),\n",
    ")\n",
    "# Load documents from web sources using the loader\n",
    "documents = loader.load()\n",
    "# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "\n",
    "# Split the documents into chunks using the text_splitter\n",
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "# Let's take a look at the first document\n",
    "docs[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "As we can see, the document is already split into chunks. And the content of the data is about the AI agent."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## Build RAG chain with Milvus Vector Store\n",
    "\n",
    "We will initialize a Milvus vector store with the documents, which load the documents into the Milvus vector store and build an index under the hood."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "Search the documents in the Milvus vector store using a test query question. We will get the top 3 documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\\nSelf-Reflection#\\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449916508032204804}),\n",
       " Document(page_content='[8] Shinn & Labash. “Reflexion: an autonomous agent with dynamic memory and self-reflection” arXiv preprint arXiv:2303.11366 (2023).\\n[9] Laskin et al. “In-context Reinforcement Learning with Algorithm Distillation” ICLR 2023.\\n[10] Karpas et al. “MRKL Systems A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.” arXiv preprint arXiv:2205.00445 (2022).\\n[11] Nakano et al. “Webgpt: Browser-assisted question-answering with human feedback.” arXiv preprint arXiv:2112.09332 (2021).\\n[12] Parisi et al. “TALM: Tool Augmented Language Models”\\n[13] Schick et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761 (2023).\\n[14] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.\\n[15] Li et al. “API-Bank: A Benchmark for Tool-Augmented LLMs” arXiv preprint arXiv:2304.08244 (2023).', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449916508032204861}),\n",
       " Document(page_content='Fig. 3. Illustration of the Reflexion framework. (Image source: Shinn & Labash, 2023)\\nThe heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment.\\nSelf-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449916508032204807})]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from rag_utils.vanilla import vectorstore\n",
    "\n",
    "vectorstore.add_documents(docs)\n",
    "\n",
    "query = \"What is self-reflection of an AI Agent?\"\n",
    "vectorstore.similarity_search(query, k=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "Use the LCEL(LangChain Expression Language) to build a RAG chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Self-reflection in an AI agent refers to the ability of the agent to improve iteratively by refining past action decisions and correcting previous mistakes. It is a vital aspect in real-world tasks where trial and error are inevitable. In the context of the Reflexion framework, self-reflection is created by showing two-shot examples to the Large Language Model (LLM). Each example is a pair of a failed trajectory and an ideal reflection for guiding future changes in the plan. These reflections are then added into the agent’s working memory, up to three, to be used as context for querying the LLM. This process allows the agent to learn from its past actions and improve its future performance.'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from rag_utils.vanilla import format_docs, rag_prompt, llm\n",
    "from langchain_core.runnables import RunnablePassthrough\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "# Convert the vector store to a retriever\n",
    "retriever = vectorstore.as_retriever()\n",
    "\n",
    "\n",
    "# Define the RAG (Retrieval-Augmented Generation) chain for AI response generation\n",
    "rag_chain = (\n",
    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
    "    | rag_prompt\n",
    "    | llm\n",
    "    | StrOutputParser()\n",
    ")\n",
    "\n",
    "# rag_chain.get_graph().print_ascii()\n",
    "\n",
    "# Invoke the RAG chain with a specific question and retrieve the response\n",
    "res = rag_chain.invoke(query)\n",
    "res"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}